Sampling-Based Speech Parameter Generation Using Moment-Matching Networks
نویسندگان
چکیده
This paper presents sampling-based speech parameter generation using moment-matching networks for Deep Neural Network (DNN)-based speech synthesis. Although people never produce exactly the same speech even if we try to express the same linguistic and para-linguistic information, typical statistical speech synthesis produces completely the same speech, i.e., there is no inter-utterance variation in synthetic speech. To give synthetic speech natural inter-utterance variation, this paper builds DNN acoustic models that make it possible to randomly sample speech parameters. The DNNs are trained so that they make the moments of generated speech parameters close to those of natural speech parameters. Since the variation of speech parameters is compressed into a low-dimensional simple prior noise vector, our algorithm has lower computation cost than direct sampling of speech parameters. As the first step towards generating synthetic speech that has natural inter-utterance variation, this paper investigates whether or not the proposed sampling-based generation deteriorates synthetic speech quality. In evaluation, we compare speech quality of conventional maximum likelihood-based generation and proposed sampling-based generation. The result demonstrates the proposed generation causes no degradation in speech quality.
منابع مشابه
ارائه روشی برای حذف خطای نوارشدگی در تصاویر سنجنده های آرایه خطی
In this paper a destriping technique is proposed for pushbroom-type satellite imaging systems. This technique is based on Moment Matching algorithm and assumes a linear response for each detector of imaging system. In the most of the works in this field, the offset parameter in detectors’ response is neglected and images are corrected using only an estimation of gain parameter. Proposed m...
متن کاملSpeech Enhancement using Adaptive Data-Based Dictionary Learning
In this paper, a speech enhancement method based on sparse representation of data frames has been presented. Speech enhancement is one of the most applicable areas in different signal processing fields. The objective of a speech enhancement system is improvement of either intelligibility or quality of the speech signals. This process is carried out using the speech signal processing techniques ...
متن کاملPrediction of Gain in LD-CELP Using Hybrid Genetic/PSO-Neural Models
In this paper, the gain in LD-CELP speech coding algorithm is predicted using three neural models, that are equipped by genetic and particle swarm optimization (PSO) algorithms to optimize the structure and parameters of neural networks. Elman, multi-layer perceptron (MLP) and fuzzy ARTMAP are the candidate neural models. The optimized number of nodes in the first and second hidden layers of El...
متن کاملPrediction of Gain in LD-CELP Using Hybrid Genetic/PSO-Neural Models
In this paper, the gain in LD-CELP speech coding algorithm is predicted using three neural models, that are equipped by genetic and particle swarm optimization (PSO) algorithms to optimize the structure and parameters of neural networks. Elman, multi-layer perceptron (MLP) and fuzzy ARTMAP are the candidate neural models. The optimized number of nodes in the first and second hidden layers of El...
متن کاملPhonological Mean Length of Utterance in 48-60-Month-old Persian-speaking Children with Isfahani Accent: Comparison of Story Generation and Conversation Samples
Objective:Phonological Mean Length of Utterance (PMLU), a quantitative measure for assessment of phonological skills, has been considered in developmental studies as a diagnostic and clinical criterion in phonological development. Moreover, it is an indicator rate of the efficacy of the intervention. The PMLU is a word level measure that can be calculated on the child’s transcribed speech sampl...
متن کامل